Evaluation of different machine learning methods for ligand-based virtual screening
نویسندگان
چکیده
In silico High Throughput Screening of large compound databases has become increasingly popular technology of finding valuable drug candidates, by applying a wide range of computational methods, such as machine learning [1]. In recent years, many comparative studies of different machine learning methods performance in ligandbased virtual screening have been reported [2,3]. In order to extend these studies, we have evaluated over 60 different machine learning methods, such as: support vector machines (with and without parameter optimization), naïve Bayesian, decision trees, random forest, meta-classifiers (boosting, bagging, grading) and many others. All calculations were performed using a collection of machine learning algorithms for data mining implemented in WEKA package [4]. Additionally, for each of the method, we have examined the influence of different type of fingerprints, the size of training sets and attribute selection methods on the rate of active recall and precision of selection. Our internal database of known 5-HT7 antagonists has been used to build training and testing sets. It was found that there is no machine learning approach that consistently provides the best results but some of them are very stable and can be applied universally.
منابع مشابه
How wrong can we get? A review of machine learning approaches and error bars.
A large number of different machine learning methods can potentially be used for ligand-based virtual screening. In our contribution, we focus on three specific nonlinear methods, namely support vector regression, Gaussian process models, and decision trees. For each of these methods, we provide a short and intuitive introduction. In particular, we will also discuss how confidence estimates (er...
متن کاملEvaluation of machine-learning methods for ligand-based virtual screening
Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it ...
متن کاملKernel learning for ligand-based virtual screening: discovery of a new PPARγ agonist
We demonstrate the theoretical and practical application of modern kernel-based machine learning methods to ligandbased virtual screening by successful prospective screening for novel agonists of the peroxisome proliferator-activated receptor g (PPARg) [1]. PPARg is a nuclear receptor involved in lipid and glucose metabolism, and related to type-2 diabetes and dyslipidemia. Applied methods incl...
متن کاملEnhancing the Effectiveness of Ligand-Based Virtual Screening Using Data Fusion
Data fusion is being increasingly used to combine the outputs of different types of sensor. This paper reviews the application of the approach to ligand-based virtual screening, where the sensors to be combined are functions that score molecules in a database on their likelihood of exhibiting some required biological activity. Much of the literature to date involves the combination of multiple ...
متن کاملDevelopment of target-biased scoring functions for protein-ligand docking
Accurate scoring of protein-ligand interactions for docking, binding-affinity prediction and virtual screening campaigns is still challenging. Despite great efforts, the performance of existing scoring functions strongly depends on the target structure under investigation. Recent developments in the direction of target-classspecific scoring methods and machine-learning-based procedures reveal s...
متن کامل